For each lineage a maximum distance (in days) between any of two sequences in cluster were calculated (D) for each cluster and compared with a reference lineage BA.2. The y-axis denotes the difference between mean D values comparing a lineage and BA.2 expressed as relative change. The positives values indicates greater stability and negative values denote lower stability using BA.2 lineage as a reference. The upper panel denotes significances of comparisons using different statistical tests, the red color denotes cases were the difference between a lineage and BA.2 lineage was found to be of low significance (p > 0.05)
The upper panel denotes results of pairwise comparisons when all
lineages were compared to the reference lineage “BA.2”. The Y axis
denotes the analyzing all together or each lineage a maximum distance
(in days) between any of two sequences in cluster were calculated (D)
for each cluster and compared with a reference lineage BA.2. The y-axis
denotes the difference between mean D values comparing a lineage and
BA.2 expressed as relative change. The positives values indicates
greater stability and negative values denote lower stability – the
values are indicated at the bottom of the plot. The lower panel denotes
results of one way ANOVA based on zero-inflated Negative Binomial
distribution. The y-axis denotes estimated marginal (EM) mean of D and
the error bar denotes the the 95% confidence interval of the EM mean.
The values of EM mean for each lineage are indicated at the bottom of
the plot.
The left panel (a) shows time trend of stability calculated as percentage difference compared to BA.2 lineage. The size of the dots denote the size of the cluster used for analysis, labels by the dots indicated the lineage and the color indicates results of Wilcoxon pairwise test with alternative hypothesis “not equal” comparing against BA.2. Also the corresponding Pearson correlation is indicated along with its significance estimate. The right panel (b) shows time trend of stability represented by the lower 95% confidence limit of EM mean (asymnp.LCL). The size of the dots denote the size of the cluster used for analysis, labels by the dots indicated the lineage. Also the results of Pearson correlation for the time trend of EM mean and its lower 95% confidence limit are indicated at the top and bottom.
In the panels a and b x-axis denotes a lineage which is indicated in the panel b. The lineages are ordered from the oldest to the newest ones. On the y-axis of panel a, the Fde (fraction of clusters of antibodies) that reveals significant differences is shown, comparing entropy values between residues in contact with antibodies and those not in contact, with the significance determined by the Wilcoxon test. The y-axis of panel b shows absolute changes in entropy values comparing positions for contacting and non contacting residues with the set of contacting residues was composed based on all antibody clusters. The y-axis in panel c denotes the Fde and the x-axis denotes month at which a lineage reached its maximum abundance.
The left panel denotes the positions for each analyzed lineages. The
lineages are indicated by the facet labels (y-axis). The sequence
positions are indicated in the x-axis. The size of dots denotes number
of structural antibodies clusters that contacts with a position. Color
denotes positional sequence cluster derived from hierarchical clustering
with complete linkage, flattening cut-off based on 100 sequence
positions. The right panel denotes changes over time in number of
contacting structural antibodies clusters for each positional cluster.
The facet labels and colors denote the positional clusters, the colors
in both right and left panels matches the same positional cluster. Each
dot in the right panel denote a lineage, x-axis denotes the month when
the abundance of a lineage peaked. On the top of the results of
corresponding Spearman correlation test are indicated.
The left panel (a) denotes the positions for each analyzed lineages. The lineages are indicated by the facet right labels, the scope of the analysis is indicated by the left labels. “all” denotes analysis based on all GISAID data, “lt” denotes analysis using subset of Lithuanian sequences. The sequence positions are indicated in the x-axis. The size of dots denotes number of structural antibodies clusters that contacts with a position. Color denotes positional sequence cluster derived from hierarchical clustering with complete linkage, flattening cut-off based on 100 sequence positions. The right panel denotes match between positions that were detected as having increased number of antibodies contacting residues based on worldwide and Lithuanian lineages. The positions were considered to be matching if distance between them was less than 10 sequence positions apart.
X-axis matches a month and Y-axis matches the number of mutations at the positions of the contacting antibodies residues. Panels a and b denotes number of mutations considering all strains. Upper panels (a,c) denote worldwide data, lower panels (b,c) denote a subset matching to Lithuania. Panels c and d denotes only data matching the most abundant month-wise lineages. The pink transparent line in panels a and b denotes number of GISAID deposited sequence originating either from all world or Lithuania respectively.
The left panel (a) denotes changes over time for each analyzed cluster individually. The top facet denotes the set of positions matching unipn of all clusters. The cluster-wise facets from top to bottom are ordered based on tau values of Mann-Kendall test for trend test of time derived from worldwide GISAID sequences. If the corresponding p > 0.05, tau values were not indicated. The tau values “NA” denotes cases with insignificant time related trends. The right panel (b) denotes regions of sequence positions along protein S that match the epitopic region of the structural antibodies clusters. These regions result from hierarchical clustering with complete linkage, flattening cut-off based on 100 sequence positions.
The plots denotes changes over time for each analyzed cluster individually. The top facet denotes the set of positions matching union of all clusters. The cluster-wise facets from top to bottom are ordered based on tau values of Mann-Kendall test for trend test of time derived from worldwide GISAID sequences. If the corresponding p > 0.05, tau values were not indicated. The tau values “NA” denotes cases with insignificant time related trends.
It would require enormous computational power to predict changes in binding to antibodies due to mutations for all observed combinations of mutations. The set for testing was composed as follows:
Below the box plots of the the evaluations of ΔΔG biding energy terms are given: • our developed evaluation method (denoted as “ddG”); • changes in binding energy calculated by EVOEF1 (denoted as “dEVOEF1”); • changes in binding energy calculated using PRODIGY energy function (denoted as “dPRODIGY”).
Notes on mutant structures generation: only impact of substitution and deletion are evaluated. Structures containing only substitutions are generated based on EvoEF1 program, modeling of deletion/substitution mix are generated by ProMod3. In both cases rotamers are optimized using FASPR.
Results of our developed evaluation method is denoted as “ddG”. “dEVOEF1” denotes changes in binding energy calculated by EvoEF1 program. Changes in binding energy calculated based on PRODIGY energy function is denoted as “dPRODIGY”.
The corresponding data in tabular format is given in the table below. If a sequence is sampled from the sequences having the highest number of mutations it is denoted as “increased” in the column “contacts_class” (sampling is described above).
It would require enormous computational power to predict changes in binding to ACE2 receptor due to mutations for all observed combinations of mutations. The set for testing was composed as follows:
Below the box plots of the the evaluations of ΔΔG biding energy terms are given: • our developed evaluation method (denoted as “ddG”); • changes in binding energy calculated by EVOEF1 (denoted as “dEVOEF1”); • changes in binding energy calculated using PRODIGY energy function (denoted as “dPRODIGY”).
Notes on mutant structures generation: only impact of substitution and deletion are evaluated. Structures containing only substitutions are generated based on EvoEF1 program, modeling of deletion/substitution mix are generated by ProMod3. In both cases rotamers are optimized using FASPR.
Results of our developed evaluation method is denoted as “ddG”. “dEVOEF1” denotes changes in binding energy calculated by EvoEF1 program. Changes in binding energy calculated based on PRODIGY energy function is denoted as “dPRODIGY”.
The corresponding data in tabular format is given in the table below. If a sequence is sampled from the sequences having the highest number of mutations it is denoted as “increased” in the column “contacts_class” (sampling is described above).